Overview
Brought to you by YData
Dataset statistics
| Dataset A | Dataset B | |
|---|---|---|
| Number of variables | 12 | 12 |
| Number of observations | 446 | 446 |
| Missing cells | 433 | 449 |
| Missing cells (%) | 8.1% | 8.4% |
| Duplicate rows | 0 | 0 |
| Duplicate rows (%) | 0.0% | 0.0% |
| Total size in memory | 45.3 KiB | 45.3 KiB |
| Average record size in memory | 104.0 B | 104.0 B |
Variable types
| Dataset A | Dataset B | |
|---|---|---|
| Numeric | 5 | 5 |
| Categorical | 4 | 4 |
| Text | 3 | 3 |
| Dataset A | Dataset B | |
|---|---|---|
Sex is highly overall correlated with Survived | Sex is highly overall correlated with Survived | High correlation |
Survived is highly overall correlated with Sex | Survived is highly overall correlated with Sex | High correlation |
Age has 82 (18.4%) missing values | Age has 94 (21.1%) missing values | Missing |
Cabin has 350 (78.5%) missing values | Cabin has 354 (79.4%) missing values | Missing |
PassengerId has unique values | PassengerId has unique values | Unique |
Name has unique values | Name has unique values | Unique |
SibSp has 300 (67.3%) zeros | SibSp has 302 (67.7%) zeros | Zeros |
Parch has 341 (76.5%) zeros | Parch has 334 (74.9%) zeros | Zeros |
Fare has 10 (2.2%) zeros | Fare has 9 (2.0%) zeros | Zeros |
Reproduction
| Dataset A | Dataset B | |
|---|---|---|
| Analysis started | 2025-03-18 21:18:17.884531 | 2025-03-18 21:18:19.968246 |
| Analysis finished | 2025-03-18 21:18:19.965425 | 2025-03-18 21:18:21.993726 |
| Duration | 2.08 seconds | 2.03 seconds |
| Software version | ydata-profiling v0.0.dev0 | ydata-profiling v0.0.dev0 |
| Download configuration | config.json | config.json |
Variables
PassengerId
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 446 | 446 |
| Distinct (%) | 100.0% | 100.0% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 453.23318 | 443.73094 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 1 | 1 |
| Maximum | 890 | 891 |
| Zeros | 0 | 0 |
| Zeros (%) | 0.0% | 0.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 1 | 1 |
| 5-th percentile | 45.75 | 36.25 |
| Q1 | 240.5 | 214.5 |
| median | 447.5 | 453.5 |
| Q3 | 674.5 | 663.5 |
| 95-th percentile | 846.25 | 843.25 |
| Maximum | 890 | 891 |
| Range | 889 | 890 |
| Interquartile range (IQR) | 434 | 449 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 256.61237 | 259.5166 |
| Coefficient of variation (CV) | 0.56618178 | 0.58485126 |
| Kurtosis | -1.1829226 | -1.196814 |
| Mean | 453.23318 | 443.73094 |
| Median Absolute Deviation (MAD) | 219 | 218.5 |
| Skewness | -0.036868491 | -0.037108928 |
| Sum | 202142 | 197904 |
| Variance | 65849.91 | 67348.867 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 572 | 1 | 0.2% |
| 624 | 1 | 0.2% |
| 886 | 1 | 0.2% |
| 379 | 1 | 0.2% |
| 853 | 1 | 0.2% |
| 127 | 1 | 0.2% |
| 319 | 1 | 0.2% |
| 733 | 1 | 0.2% |
| 517 | 1 | 0.2% |
| 235 | 1 | 0.2% |
| Other values (436) | 436 |
| Value | Count | Frequency (%) |
| 220 | 1 | 0.2% |
| 173 | 1 | 0.2% |
| 27 | 1 | 0.2% |
| 756 | 1 | 0.2% |
| 707 | 1 | 0.2% |
| 592 | 1 | 0.2% |
| 798 | 1 | 0.2% |
| 148 | 1 | 0.2% |
| 424 | 1 | 0.2% |
| 518 | 1 | 0.2% |
| Other values (436) | 436 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 9 | 1 | |
| 11 | 1 | |
| 12 | 1 | |
| 17 | 1 | |
| 19 | 1 | |
| 21 | 1 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 6 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 11 | 1 | |
| 12 | 1 | |
| 14 | 1 | |
| 16 | 1 | |
| 18 | 1 | |
| 19 | 1 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 6 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 11 | 1 | |
| 12 | 1 | |
| 14 | 1 | |
| 16 | 1 | |
| 18 | 1 | |
| 19 | 1 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 9 | 1 | |
| 11 | 1 | |
| 12 | 1 | |
| 17 | 1 | |
| 19 | 1 | |
| 21 | 1 |
Survived
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 2 | 2 |
| Distinct (%) | 0.4% | 0.4% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| 0 | |
|---|---|
| 1 |
| 0 | |
|---|---|
| 1 |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 0 | 1 |
| 2nd row | 0 | 0 |
| 3rd row | 0 | 1 |
| 4th row | 0 | 1 |
| 5th row | 0 | 1 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 274 | |
| 1 | 172 |
| Value | Count | Frequency (%) |
| 0 | 275 | |
| 1 | 171 |
Length
Histogram of lengths of the category
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| 0 | 274 | |
| 1 | 172 |
| Value | Count | Frequency (%) |
| 0 | 275 | |
| 1 | 171 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 274 | |
| 1 | 172 |
| Value | Count | Frequency (%) |
| 0 | 275 | |
| 1 | 171 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 274 | |
| 1 | 172 |
| Value | Count | Frequency (%) |
| 0 | 275 | |
| 1 | 171 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 274 | |
| 1 | 172 |
| Value | Count | Frequency (%) |
| 0 | 275 | |
| 1 | 171 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 274 | |
| 1 | 172 |
| Value | Count | Frequency (%) |
| 0 | 275 | |
| 1 | 171 |
Pclass
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 3 | 3 |
| Distinct (%) | 0.7% | 0.7% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| 3 | |
|---|---|
| 1 | |
| 2 |
| 3 | |
|---|---|
| 1 | |
| 2 |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 3 | 3 |
| 2nd row | 3 | 3 |
| 3rd row | 3 | 2 |
| 4th row | 3 | 2 |
| 5th row | 3 | 1 |
Common Values
| Value | Count | Frequency (%) |
| 3 | 244 | |
| 1 | 108 | |
| 2 | 94 | 21.1% |
| Value | Count | Frequency (%) |
| 3 | 253 | |
| 1 | 98 | 22.0% |
| 2 | 95 | 21.3% |
Length
Histogram of lengths of the category
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| 3 | 244 | |
| 1 | 108 | |
| 2 | 94 | 21.1% |
| Value | Count | Frequency (%) |
| 3 | 253 | |
| 1 | 98 | 22.0% |
| 2 | 95 | 21.3% |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 244 | |
| 1 | 108 | |
| 2 | 94 | 21.1% |
| Value | Count | Frequency (%) |
| 3 | 253 | |
| 1 | 98 | 22.0% |
| 2 | 95 | 21.3% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 3 | 244 | |
| 1 | 108 | |
| 2 | 94 | 21.1% |
| Value | Count | Frequency (%) |
| 3 | 253 | |
| 1 | 98 | 22.0% |
| 2 | 95 | 21.3% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 3 | 244 | |
| 1 | 108 | |
| 2 | 94 | 21.1% |
| Value | Count | Frequency (%) |
| 3 | 253 | |
| 1 | 98 | 22.0% |
| 2 | 95 | 21.3% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 3 | 244 | |
| 1 | 108 | |
| 2 | 94 | 21.1% |
| Value | Count | Frequency (%) |
| 3 | 253 | |
| 1 | 98 | 22.0% |
| 2 | 95 | 21.3% |
Name
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 446 | 446 |
| Distinct (%) | 100.0% | 100.0% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 82 | 67 |
| Median length | 48 | 49 |
| Mean length | 27.345291 | 26.847534 |
| Min length | 12 | 12 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 446 | 446 ? |
| Unique (%) | 100.0% | 100.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | Hansen, Mr. Henry Damsgaard | Johnson, Miss. Eleanor Ileen |
| 2nd row | Rice, Mrs. William (Margaret Norton) | Emir, Mr. Farred Chehab |
| 3rd row | Betros, Mr. Tannous | Hamalainen, Master. Viljo |
| 4th row | Boulos, Miss. Nourelain | Kelly, Mrs. Florence "Fannie" |
| 5th row | McMahon, Mr. Martin | Stephenson, Mrs. Walter Bertram (Martha Eustis) |
| Value | Count | Frequency (%) |
| mr | 251 | 13.6% |
| miss | 91 | 4.9% |
| mrs | 73 | 4.0% |
| william | 37 | 2.0% |
| john | 23 | 1.2% |
| henry | 21 | 1.1% |
| master | 19 | 1.0% |
| mary | 13 | 0.7% |
| george | 12 | 0.6% |
| james | 11 | 0.6% |
| Other values (899) | 1296 |
| Value | Count | Frequency (%) |
| mr | 260 | 14.4% |
| miss | 94 | 5.2% |
| mrs | 67 | 3.7% |
| william | 23 | 1.3% |
| john | 23 | 1.3% |
| master | 18 | 1.0% |
| henry | 17 | 0.9% |
| george | 14 | 0.8% |
| charles | 12 | 0.7% |
| james | 12 | 0.7% |
| Other values (876) | 1267 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1401 | 11.5% | |
| r | 992 | 8.1% |
| e | 873 | 7.2% |
| a | 834 | 6.8% |
| i | 683 | 5.6% |
| n | 642 | 5.3% |
| s | 637 | 5.2% |
| l | 583 | 4.8% |
| M | 572 | 4.7% |
| o | 532 | 4.4% |
| Other values (50) | 4447 |
| Value | Count | Frequency (%) |
| 1363 | 11.4% | |
| r | 988 | 8.3% |
| e | 856 | 7.1% |
| a | 844 | 7.0% |
| n | 642 | 5.4% |
| s | 632 | 5.3% |
| i | 629 | 5.3% |
| M | 576 | 4.8% |
| l | 524 | 4.4% |
| o | 490 | 4.1% |
| Other values (50) | 4430 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 12196 |
| Value | Count | Frequency (%) |
| (unknown) | 11974 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 1401 | 11.5% | |
| r | 992 | 8.1% |
| e | 873 | 7.2% |
| a | 834 | 6.8% |
| i | 683 | 5.6% |
| n | 642 | 5.3% |
| s | 637 | 5.2% |
| l | 583 | 4.8% |
| M | 572 | 4.7% |
| o | 532 | 4.4% |
| Other values (50) | 4447 |
| Value | Count | Frequency (%) |
| 1363 | 11.4% | |
| r | 988 | 8.3% |
| e | 856 | 7.1% |
| a | 844 | 7.0% |
| n | 642 | 5.4% |
| s | 632 | 5.3% |
| i | 629 | 5.3% |
| M | 576 | 4.8% |
| l | 524 | 4.4% |
| o | 490 | 4.1% |
| Other values (50) | 4430 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 12196 |
| Value | Count | Frequency (%) |
| (unknown) | 11974 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 1401 | 11.5% | |
| r | 992 | 8.1% |
| e | 873 | 7.2% |
| a | 834 | 6.8% |
| i | 683 | 5.6% |
| n | 642 | 5.3% |
| s | 637 | 5.2% |
| l | 583 | 4.8% |
| M | 572 | 4.7% |
| o | 532 | 4.4% |
| Other values (50) | 4447 |
| Value | Count | Frequency (%) |
| 1363 | 11.4% | |
| r | 988 | 8.3% |
| e | 856 | 7.1% |
| a | 844 | 7.0% |
| n | 642 | 5.4% |
| s | 632 | 5.3% |
| i | 629 | 5.3% |
| M | 576 | 4.8% |
| l | 524 | 4.4% |
| o | 490 | 4.1% |
| Other values (50) | 4430 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 12196 |
| Value | Count | Frequency (%) |
| (unknown) | 11974 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 1401 | 11.5% | |
| r | 992 | 8.1% |
| e | 873 | 7.2% |
| a | 834 | 6.8% |
| i | 683 | 5.6% |
| n | 642 | 5.3% |
| s | 637 | 5.2% |
| l | 583 | 4.8% |
| M | 572 | 4.7% |
| o | 532 | 4.4% |
| Other values (50) | 4447 |
| Value | Count | Frequency (%) |
| 1363 | 11.4% | |
| r | 988 | 8.3% |
| e | 856 | 7.1% |
| a | 844 | 7.0% |
| n | 642 | 5.4% |
| s | 632 | 5.3% |
| i | 629 | 5.3% |
| M | 576 | 4.8% |
| l | 524 | 4.4% |
| o | 490 | 4.1% |
| Other values (50) | 4430 |
Sex
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 2 | 2 |
| Distinct (%) | 0.4% | 0.4% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| male | |
|---|---|
| female |
| male | |
|---|---|
| female |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 6 | 6 |
| Median length | 4 | 4 |
| Mean length | 4.7443946 | 4.7264574 |
| Min length | 4 | 4 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | male | female |
| 2nd row | female | male |
| 3rd row | male | male |
| 4th row | female | female |
| 5th row | male | female |
Common Values
| Value | Count | Frequency (%) |
| male | 280 | |
| female | 166 |
| Value | Count | Frequency (%) |
| male | 284 | |
| female | 162 |
Length
Histogram of lengths of the category
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| male | 280 | |
| female | 166 |
| Value | Count | Frequency (%) |
| male | 284 | |
| female | 162 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 612 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 166 | 7.8% |
| Value | Count | Frequency (%) |
| e | 608 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 162 | 7.7% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 2116 |
| Value | Count | Frequency (%) |
| (unknown) | 2108 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 612 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 166 | 7.8% |
| Value | Count | Frequency (%) |
| e | 608 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 162 | 7.7% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 2116 |
| Value | Count | Frequency (%) |
| (unknown) | 2108 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 612 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 166 | 7.8% |
| Value | Count | Frequency (%) |
| e | 608 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 162 | 7.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 2116 |
| Value | Count | Frequency (%) |
| (unknown) | 2108 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 612 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 166 | 7.8% |
| Value | Count | Frequency (%) |
| e | 608 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 162 | 7.7% |
Age
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 72 | 79 |
| Distinct (%) | 19.8% | 22.4% |
| Missing | 82 | 94 |
| Missing (%) | 18.4% | 21.1% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 29.878434 | 29.587614 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0.83 | 0.42 |
| Maximum | 74 | 74 |
| Zeros | 0 | 0 |
| Zeros (%) | 0.0% | 0.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0.83 | 0.42 |
| 5-th percentile | 4 | 4 |
| Q1 | 20.875 | 21 |
| median | 28 | 28.75 |
| Q3 | 38 | 38 |
| 95-th percentile | 56 | 58 |
| Maximum | 74 | 74 |
| Range | 73.17 | 73.58 |
| Interquartile range (IQR) | 17.125 | 17 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 14.433438 | 14.619999 |
| Coefficient of variation (CV) | 0.48307209 | 0.49412565 |
| Kurtosis | 0.15944868 | 0.19413525 |
| Mean | 29.878434 | 29.587614 |
| Median Absolute Deviation (MAD) | 8 | 8.25 |
| Skewness | 0.39265655 | 0.35147572 |
| Sum | 10875.75 | 10414.84 |
| Variance | 208.32412 | 213.74437 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 24 | 18 | 4.0% |
| 18 | 16 | 3.6% |
| 28 | 15 | 3.4% |
| 22 | 15 | 3.4% |
| 35 | 15 | 3.4% |
| 19 | 13 | 2.9% |
| 36 | 12 | 2.7% |
| 30 | 12 | 2.7% |
| 23 | 11 | 2.5% |
| 25 | 10 | 2.2% |
| Other values (62) | 227 | |
| (Missing) | 82 | 18.4% |
| Value | Count | Frequency (%) |
| 24 | 15 | 3.4% |
| 22 | 15 | 3.4% |
| 36 | 14 | 3.1% |
| 30 | 12 | 2.7% |
| 29 | 12 | 2.7% |
| 28 | 12 | 2.7% |
| 31 | 12 | 2.7% |
| 18 | 12 | 2.7% |
| 19 | 12 | 2.7% |
| 33 | 11 | 2.5% |
| Other values (69) | 225 | |
| (Missing) | 94 |
| Value | Count | Frequency (%) |
| 0.83 | 1 | 0.2% |
| 0.92 | 1 | 0.2% |
| 1 | 2 | 0.4% |
| 2 | 6 | |
| 3 | 6 | |
| 4 | 4 | |
| 6 | 2 | 0.4% |
| 7 | 2 | 0.4% |
| 8 | 1 | 0.2% |
| 9 | 5 |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.67 | 1 | 0.2% |
| 0.75 | 2 | 0.4% |
| 0.83 | 1 | 0.2% |
| 0.92 | 1 | 0.2% |
| 1 | 5 | |
| 2 | 3 | |
| 3 | 3 | |
| 4 | 5 | |
| 5 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.67 | 1 | 0.2% |
| 0.75 | 2 | 0.4% |
| 0.83 | 1 | 0.2% |
| 0.92 | 1 | 0.2% |
| 1 | 5 | |
| 2 | 3 | |
| 3 | 3 | |
| 4 | 5 | |
| 5 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0.83 | 1 | 0.2% |
| 0.92 | 1 | 0.2% |
| 1 | 2 | 0.4% |
| 2 | 6 | |
| 3 | 6 | |
| 4 | 4 | |
| 6 | 2 | 0.4% |
| 7 | 2 | 0.4% |
| 8 | 1 | 0.2% |
| 9 | 5 |
SibSp
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 7 | 7 |
| Distinct (%) | 1.6% | 1.6% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 0.54035874 | 0.55829596 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 8 | 8 |
| Zeros | 300 | 302 |
| Zeros (%) | 67.3% | 67.7% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 0 | 0 |
| Q1 | 0 | 0 |
| median | 0 | 0 |
| Q3 | 1 | 1 |
| 95-th percentile | 3 | 3 |
| Maximum | 8 | 8 |
| Range | 8 | 8 |
| Interquartile range (IQR) | 1 | 1 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 1.0692574 | 1.1572772 |
| Coefficient of variation (CV) | 1.9787917 | 2.0728741 |
| Kurtosis | 13.655148 | 15.993364 |
| Mean | 0.54035874 | 0.55829596 |
| Median Absolute Deviation (MAD) | 0 | 0 |
| Skewness | 3.2015606 | 3.5058105 |
| Sum | 241 | 249 |
| Variance | 1.1433113 | 1.3392906 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=7)
| Value | Count | Frequency (%) |
| 0 | 300 | |
| 1 | 104 | 23.3% |
| 2 | 16 | 3.6% |
| 3 | 11 | 2.5% |
| 4 | 9 | 2.0% |
| 5 | 4 | 0.9% |
| 8 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 302 | |
| 1 | 101 | 22.6% |
| 2 | 15 | 3.4% |
| 3 | 12 | 2.7% |
| 4 | 10 | 2.2% |
| 8 | 4 | 0.9% |
| 5 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 300 | |
| 1 | 104 | 23.3% |
| 2 | 16 | 3.6% |
| 3 | 11 | 2.5% |
| 4 | 9 | 2.0% |
| 5 | 4 | 0.9% |
| 8 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 302 | |
| 1 | 101 | 22.6% |
| 2 | 15 | 3.4% |
| 3 | 12 | 2.7% |
| 4 | 10 | 2.2% |
| 5 | 2 | 0.4% |
| 8 | 4 | 0.9% |
| Value | Count | Frequency (%) |
| 0 | 302 | |
| 1 | 101 | 22.6% |
| 2 | 15 | 3.4% |
| 3 | 12 | 2.7% |
| 4 | 10 | 2.2% |
| 5 | 2 | 0.4% |
| 8 | 4 | 0.9% |
| Value | Count | Frequency (%) |
| 0 | 300 | |
| 1 | 104 | 23.3% |
| 2 | 16 | 3.6% |
| 3 | 11 | 2.5% |
| 4 | 9 | 2.0% |
| 5 | 4 | 0.9% |
| 8 | 2 | 0.4% |
Parch
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 7 | 6 |
| Distinct (%) | 1.6% | 1.3% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 0.39686099 | 0.4058296 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 6 | 5 |
| Zeros | 341 | 334 |
| Zeros (%) | 76.5% | 74.9% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 0 | 0 |
| Q1 | 0 | 0 |
| median | 0 | 0 |
| Q3 | 0 | 0.75 |
| 95-th percentile | 2 | 2 |
| Maximum | 6 | 5 |
| Range | 6 | 5 |
| Interquartile range (IQR) | 0 | 0.75 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 0.86213167 | 0.84739501 |
| Coefficient of variation (CV) | 2.172377 | 2.0880562 |
| Kurtosis | 10.403407 | 9.7373855 |
| Mean | 0.39686099 | 0.4058296 |
| Median Absolute Deviation (MAD) | 0 | 0 |
| Skewness | 2.8791806 | 2.7902118 |
| Sum | 177 | 181 |
| Variance | 0.74327102 | 0.7180783 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=7)
| Value | Count | Frequency (%) |
| 0 | 341 | |
| 1 | 55 | 12.3% |
| 2 | 40 | 9.0% |
| 5 | 3 | 0.7% |
| 3 | 3 | 0.7% |
| 4 | 3 | 0.7% |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 334 | |
| 1 | 63 | 14.1% |
| 2 | 41 | 9.2% |
| 5 | 5 | 1.1% |
| 4 | 2 | 0.4% |
| 3 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 341 | |
| 1 | 55 | 12.3% |
| 2 | 40 | 9.0% |
| 3 | 3 | 0.7% |
| 4 | 3 | 0.7% |
| 5 | 3 | 0.7% |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 334 | |
| 1 | 63 | 14.1% |
| 2 | 41 | 9.2% |
| 3 | 1 | 0.2% |
| 4 | 2 | 0.4% |
| 5 | 5 | 1.1% |
| Value | Count | Frequency (%) |
| 0 | 334 | |
| 1 | 63 | 14.1% |
| 2 | 41 | 9.2% |
| 3 | 1 | 0.2% |
| 4 | 2 | 0.4% |
| 5 | 5 | 1.1% |
| Value | Count | Frequency (%) |
| 0 | 341 | |
| 1 | 55 | 12.3% |
| 2 | 40 | 9.0% |
| 3 | 3 | 0.7% |
| 4 | 3 | 0.7% |
| 5 | 3 | 0.7% |
| 6 | 1 | 0.2% |
Ticket
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 376 | 377 |
| Distinct (%) | 84.3% | 84.5% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 18 | 18 |
| Median length | 6 | 17 |
| Mean length | 6.661435 | 6.7488789 |
| Min length | 3 | 3 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 325 | 325 ? |
| Unique (%) | 72.9% | 72.9% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 350029 | 347742 |
| 2nd row | 382652 | 2631 |
| 3rd row | 2648 | 250649 |
| 4th row | 2678 | 223596 |
| 5th row | 370372 | 36947 |
| Value | Count | Frequency (%) |
| pc | 30 | 5.4% |
| c.a | 12 | 2.2% |
| ca | 8 | 1.4% |
| a/5 | 7 | 1.3% |
| 2144 | 5 | 0.9% |
| c | 5 | 0.9% |
| ston/o | 5 | 0.9% |
| 2 | 5 | 0.9% |
| 1601 | 5 | 0.9% |
| sc/paris | 5 | 0.9% |
| Other values (392) | 468 |
| Value | Count | Frequency (%) |
| pc | 27 | 4.8% |
| c.a | 14 | 2.5% |
| a/5 | 8 | 1.4% |
| ca | 7 | 1.2% |
| ston/o | 7 | 1.2% |
| 2 | 7 | 1.2% |
| a/4 | 6 | 1.1% |
| 3101295 | 5 | 0.9% |
| sc/paris | 5 | 0.9% |
| w./c | 5 | 0.9% |
| Other values (398) | 477 |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 374 | |
| 1 | 346 | |
| 2 | 291 | |
| 7 | 248 | |
| 4 | 238 | |
| 6 | 213 | 7.2% |
| 0 | 209 | 7.0% |
| 9 | 186 | 6.3% |
| 5 | 180 | 6.1% |
| 8 | 141 | 4.7% |
| Other values (22) | 545 |
| Value | Count | Frequency (%) |
| 3 | 360 | |
| 1 | 320 | |
| 2 | 285 | |
| 4 | 253 | |
| 7 | 245 | 8.1% |
| 6 | 217 | 7.2% |
| 0 | 205 | 6.8% |
| 5 | 199 | 6.6% |
| 9 | 181 | 6.0% |
| 8 | 130 | 4.3% |
| Other values (25) | 615 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 2971 |
| Value | Count | Frequency (%) |
| (unknown) | 3010 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 3 | 374 | |
| 1 | 346 | |
| 2 | 291 | |
| 7 | 248 | |
| 4 | 238 | |
| 6 | 213 | 7.2% |
| 0 | 209 | 7.0% |
| 9 | 186 | 6.3% |
| 5 | 180 | 6.1% |
| 8 | 141 | 4.7% |
| Other values (22) | 545 |
| Value | Count | Frequency (%) |
| 3 | 360 | |
| 1 | 320 | |
| 2 | 285 | |
| 4 | 253 | |
| 7 | 245 | 8.1% |
| 6 | 217 | 7.2% |
| 0 | 205 | 6.8% |
| 5 | 199 | 6.6% |
| 9 | 181 | 6.0% |
| 8 | 130 | 4.3% |
| Other values (25) | 615 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 2971 |
| Value | Count | Frequency (%) |
| (unknown) | 3010 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 3 | 374 | |
| 1 | 346 | |
| 2 | 291 | |
| 7 | 248 | |
| 4 | 238 | |
| 6 | 213 | 7.2% |
| 0 | 209 | 7.0% |
| 9 | 186 | 6.3% |
| 5 | 180 | 6.1% |
| 8 | 141 | 4.7% |
| Other values (22) | 545 |
| Value | Count | Frequency (%) |
| 3 | 360 | |
| 1 | 320 | |
| 2 | 285 | |
| 4 | 253 | |
| 7 | 245 | 8.1% |
| 6 | 217 | 7.2% |
| 0 | 205 | 6.8% |
| 5 | 199 | 6.6% |
| 9 | 181 | 6.0% |
| 8 | 130 | 4.3% |
| Other values (25) | 615 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 2971 |
| Value | Count | Frequency (%) |
| (unknown) | 3010 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 3 | 374 | |
| 1 | 346 | |
| 2 | 291 | |
| 7 | 248 | |
| 4 | 238 | |
| 6 | 213 | 7.2% |
| 0 | 209 | 7.0% |
| 9 | 186 | 6.3% |
| 5 | 180 | 6.1% |
| 8 | 141 | 4.7% |
| Other values (22) | 545 |
| Value | Count | Frequency (%) |
| 3 | 360 | |
| 1 | 320 | |
| 2 | 285 | |
| 4 | 253 | |
| 7 | 245 | 8.1% |
| 6 | 217 | 7.2% |
| 0 | 205 | 6.8% |
| 5 | 199 | 6.6% |
| 9 | 181 | 6.0% |
| 8 | 130 | 4.3% |
| Other values (25) | 615 |
Fare
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 169 | 180 |
| Distinct (%) | 37.9% | 40.4% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 31.779016 | 33.132539 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 512.3292 | 512.3292 |
| Zeros | 10 | 9 |
| Zeros (%) | 2.2% | 2.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 7.06875 | 7.225 |
| Q1 | 7.8958 | 7.9031 |
| median | 14.4542 | 14.5 |
| Q3 | 30.92395 | 29.925 |
| 95-th percentile | 112.67708 | 118.31875 |
| Maximum | 512.3292 | 512.3292 |
| Range | 512.3292 | 512.3292 |
| Interquartile range (IQR) | 23.02815 | 22.0219 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 50.310416 | 54.521731 |
| Coefficient of variation (CV) | 1.5831332 | 1.6455645 |
| Kurtosis | 40.129611 | 30.294074 |
| Mean | 31.779016 | 33.132539 |
| Median Absolute Deviation (MAD) | 7.0813 | 6.8125 |
| Skewness | 5.2875588 | 4.7158345 |
| Sum | 14173.441 | 14777.113 |
| Variance | 2531.1379 | 2972.6192 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 13 | 24 | 5.4% |
| 7.8958 | 19 | 4.3% |
| 7.75 | 18 | 4.0% |
| 8.05 | 16 | 3.6% |
| 10.5 | 13 | 2.9% |
| 26 | 13 | 2.9% |
| 7.775 | 11 | 2.5% |
| 7.925 | 10 | 2.2% |
| 0 | 10 | 2.2% |
| 26.55 | 10 | 2.2% |
| Other values (159) | 302 |
| Value | Count | Frequency (%) |
| 7.8958 | 21 | 4.7% |
| 8.05 | 19 | 4.3% |
| 13 | 17 | 3.8% |
| 26 | 16 | 3.6% |
| 7.75 | 15 | 3.4% |
| 10.5 | 11 | 2.5% |
| 0 | 9 | 2.0% |
| 26.55 | 9 | 2.0% |
| 7.925 | 9 | 2.0% |
| 8.6625 | 8 | 1.8% |
| Other values (170) | 312 |
| Value | Count | Frequency (%) |
| 0 | 10 | |
| 4.0125 | 1 | 0.2% |
| 5 | 1 | 0.2% |
| 6.2375 | 1 | 0.2% |
| 6.4375 | 1 | 0.2% |
| 6.45 | 1 | 0.2% |
| 6.4958 | 2 | 0.4% |
| 6.75 | 1 | 0.2% |
| 6.95 | 1 | 0.2% |
| 6.975 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 9 | |
| 5 | 1 | 0.2% |
| 6.2375 | 1 | 0.2% |
| 6.4375 | 1 | 0.2% |
| 6.45 | 1 | 0.2% |
| 6.75 | 1 | 0.2% |
| 6.8583 | 1 | 0.2% |
| 6.975 | 2 | 0.4% |
| 7.05 | 2 | 0.4% |
| 7.125 | 3 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 9 | |
| 5 | 1 | 0.2% |
| 6.2375 | 1 | 0.2% |
| 6.4375 | 1 | 0.2% |
| 6.45 | 1 | 0.2% |
| 6.75 | 1 | 0.2% |
| 6.8583 | 1 | 0.2% |
| 6.975 | 2 | 0.4% |
| 7.05 | 2 | 0.4% |
| 7.125 | 3 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 10 | |
| 4.0125 | 1 | 0.2% |
| 5 | 1 | 0.2% |
| 6.2375 | 1 | 0.2% |
| 6.4375 | 1 | 0.2% |
| 6.45 | 1 | 0.2% |
| 6.4958 | 2 | 0.4% |
| 6.75 | 1 | 0.2% |
| 6.95 | 1 | 0.2% |
| 6.975 | 1 | 0.2% |
Cabin
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 77 | 76 |
| Distinct (%) | 80.2% | 82.6% |
| Missing | 350 | 354 |
| Missing (%) | 78.5% | 79.4% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 11 | 11 |
| Median length | 3 | 3 |
| Mean length | 3.65625 | 3.6630435 |
| Min length | 1 | 1 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 60 | 64 ? |
| Unique (%) | 62.5% | 69.6% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | C7 | D20 |
| 2nd row | F33 | F33 |
| 3rd row | C126 | B73 |
| 4th row | B49 | A7 |
| 5th row | C78 | B5 |
| Value | Count | Frequency (%) |
| b96 | 3 | 2.7% |
| b98 | 3 | 2.7% |
| c23 | 3 | 2.7% |
| c25 | 3 | 2.7% |
| c27 | 3 | 2.7% |
| b49 | 2 | 1.8% |
| c78 | 2 | 1.8% |
| c126 | 2 | 1.8% |
| c22 | 2 | 1.8% |
| c26 | 2 | 1.8% |
| Other values (76) | 88 |
| Value | Count | Frequency (%) |
| c23 | 4 | 3.6% |
| c25 | 4 | 3.6% |
| c27 | 4 | 3.6% |
| f33 | 3 | 2.7% |
| g6 | 3 | 2.7% |
| b35 | 2 | 1.8% |
| c83 | 2 | 1.8% |
| b58 | 2 | 1.8% |
| b60 | 2 | 1.8% |
| c52 | 2 | 1.8% |
| Other values (75) | 82 |
Most occurring characters
| Value | Count | Frequency (%) |
| C | 40 | |
| 3 | 37 | |
| 2 | 32 | 9.1% |
| B | 29 | 8.3% |
| 1 | 26 | 7.4% |
| 6 | 24 | 6.8% |
| 8 | 21 | 6.0% |
| 9 | 20 | 5.7% |
| 17 | 4.8% | |
| 0 | 16 | 4.6% |
| Other values (9) | 89 |
| Value | Count | Frequency (%) |
| 2 | 36 | |
| C | 35 | |
| B | 33 | |
| 3 | 31 | 9.2% |
| 5 | 26 | 7.7% |
| 1 | 21 | 6.2% |
| 6 | 20 | 5.9% |
| 7 | 18 | 5.3% |
| 0 | 18 | 5.3% |
| 18 | 5.3% | |
| Other values (8) | 81 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 351 |
| Value | Count | Frequency (%) |
| (unknown) | 337 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| C | 40 | |
| 3 | 37 | |
| 2 | 32 | 9.1% |
| B | 29 | 8.3% |
| 1 | 26 | 7.4% |
| 6 | 24 | 6.8% |
| 8 | 21 | 6.0% |
| 9 | 20 | 5.7% |
| 17 | 4.8% | |
| 0 | 16 | 4.6% |
| Other values (9) | 89 |
| Value | Count | Frequency (%) |
| 2 | 36 | |
| C | 35 | |
| B | 33 | |
| 3 | 31 | 9.2% |
| 5 | 26 | 7.7% |
| 1 | 21 | 6.2% |
| 6 | 20 | 5.9% |
| 7 | 18 | 5.3% |
| 0 | 18 | 5.3% |
| 18 | 5.3% | |
| Other values (8) | 81 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 351 |
| Value | Count | Frequency (%) |
| (unknown) | 337 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| C | 40 | |
| 3 | 37 | |
| 2 | 32 | 9.1% |
| B | 29 | 8.3% |
| 1 | 26 | 7.4% |
| 6 | 24 | 6.8% |
| 8 | 21 | 6.0% |
| 9 | 20 | 5.7% |
| 17 | 4.8% | |
| 0 | 16 | 4.6% |
| Other values (9) | 89 |
| Value | Count | Frequency (%) |
| 2 | 36 | |
| C | 35 | |
| B | 33 | |
| 3 | 31 | 9.2% |
| 5 | 26 | 7.7% |
| 1 | 21 | 6.2% |
| 6 | 20 | 5.9% |
| 7 | 18 | 5.3% |
| 0 | 18 | 5.3% |
| 18 | 5.3% | |
| Other values (8) | 81 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 351 |
| Value | Count | Frequency (%) |
| (unknown) | 337 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| C | 40 | |
| 3 | 37 | |
| 2 | 32 | 9.1% |
| B | 29 | 8.3% |
| 1 | 26 | 7.4% |
| 6 | 24 | 6.8% |
| 8 | 21 | 6.0% |
| 9 | 20 | 5.7% |
| 17 | 4.8% | |
| 0 | 16 | 4.6% |
| Other values (9) | 89 |
| Value | Count | Frequency (%) |
| 2 | 36 | |
| C | 35 | |
| B | 33 | |
| 3 | 31 | 9.2% |
| 5 | 26 | 7.7% |
| 1 | 21 | 6.2% |
| 6 | 20 | 5.9% |
| 7 | 18 | 5.3% |
| 0 | 18 | 5.3% |
| 18 | 5.3% | |
| Other values (8) | 81 |
Embarked
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 3 | 3 |
| Distinct (%) | 0.7% | 0.7% |
| Missing | 1 | 1 |
| Missing (%) | 0.2% | 0.2% |
| Memory size | 7.0 KiB | 7.0 KiB |
| S | |
|---|---|
| C | |
| Q |
| S | |
|---|---|
| C | |
| Q |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | S | S |
| 2nd row | Q | C |
| 3rd row | C | S |
| 4th row | C | S |
| 5th row | Q | C |
Common Values
| Value | Count | Frequency (%) |
| S | 323 | |
| C | 82 | 18.4% |
| Q | 40 | 9.0% |
| (Missing) | 1 | 0.2% |
| Value | Count | Frequency (%) |
| S | 330 | |
| C | 79 | 17.7% |
| Q | 36 | 8.1% |
| (Missing) | 1 | 0.2% |
Length
Histogram of lengths of the category
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| s | 323 | |
| c | 82 | 18.4% |
| q | 40 | 9.0% |
| Value | Count | Frequency (%) |
| s | 330 | |
| c | 79 | 17.8% |
| q | 36 | 8.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| S | 323 | |
| C | 82 | 18.4% |
| Q | 40 | 9.0% |
| Value | Count | Frequency (%) |
| S | 330 | |
| C | 79 | 17.8% |
| Q | 36 | 8.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 445 |
| Value | Count | Frequency (%) |
| (unknown) | 445 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| S | 323 | |
| C | 82 | 18.4% |
| Q | 40 | 9.0% |
| Value | Count | Frequency (%) |
| S | 330 | |
| C | 79 | 17.8% |
| Q | 36 | 8.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 445 |
| Value | Count | Frequency (%) |
| (unknown) | 445 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| S | 323 | |
| C | 82 | 18.4% |
| Q | 40 | 9.0% |
| Value | Count | Frequency (%) |
| S | 330 | |
| C | 79 | 17.8% |
| Q | 36 | 8.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 445 |
| Value | Count | Frequency (%) |
| (unknown) | 445 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| S | 323 | |
| C | 82 | 18.4% |
| Q | 40 | 9.0% |
| Value | Count | Frequency (%) |
| S | 330 | |
| C | 79 | 17.8% |
| Q | 36 | 8.1% |
Interactions
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Correlations
Dataset A
Dataset B
Dataset A
| Age | Embarked | Fare | Parch | PassengerId | Pclass | Sex | SibSp | Survived | |
|---|---|---|---|---|---|---|---|---|---|
| Age | 1.000 | 0.071 | 0.115 | -0.262 | 0.031 | 0.268 | 0.080 | -0.185 | 0.102 |
| Embarked | 0.071 | 1.000 | 0.193 | 0.038 | 0.000 | 0.253 | 0.054 | 0.111 | 0.039 |
| Fare | 0.115 | 0.193 | 1.000 | 0.393 | -0.035 | 0.469 | 0.147 | 0.473 | 0.338 |
| Parch | -0.262 | 0.038 | 0.393 | 1.000 | 0.002 | 0.000 | 0.297 | 0.422 | 0.154 |
| PassengerId | 0.031 | 0.000 | -0.035 | 0.002 | 1.000 | 0.000 | 0.083 | -0.106 | 0.000 |
| Pclass | 0.268 | 0.253 | 0.469 | 0.000 | 0.000 | 1.000 | 0.088 | 0.137 | 0.384 |
| Sex | 0.080 | 0.054 | 0.147 | 0.297 | 0.083 | 0.088 | 1.000 | 0.240 | 0.527 |
| SibSp | -0.185 | 0.111 | 0.473 | 0.422 | -0.106 | 0.137 | 0.240 | 1.000 | 0.203 |
| Survived | 0.102 | 0.039 | 0.338 | 0.154 | 0.000 | 0.384 | 0.527 | 0.203 | 1.000 |
Dataset B
| Age | Embarked | Fare | Parch | PassengerId | Pclass | Sex | SibSp | Survived | |
|---|---|---|---|---|---|---|---|---|---|
| Age | 1.000 | 0.089 | 0.055 | -0.240 | 0.018 | 0.192 | 0.131 | -0.240 | 0.124 |
| Embarked | 0.089 | 1.000 | 0.215 | 0.000 | 0.000 | 0.232 | 0.024 | 0.024 | 0.154 |
| Fare | 0.055 | 0.215 | 1.000 | 0.431 | 0.006 | 0.468 | 0.216 | 0.426 | 0.256 |
| Parch | -0.240 | 0.000 | 0.431 | 1.000 | -0.026 | 0.033 | 0.273 | 0.433 | 0.159 |
| PassengerId | 0.018 | 0.000 | 0.006 | -0.026 | 1.000 | 0.000 | 0.000 | -0.073 | 0.112 |
| Pclass | 0.192 | 0.232 | 0.468 | 0.033 | 0.000 | 1.000 | 0.162 | 0.112 | 0.391 |
| Sex | 0.131 | 0.024 | 0.216 | 0.273 | 0.000 | 0.162 | 1.000 | 0.183 | 0.587 |
| SibSp | -0.240 | 0.024 | 0.426 | 0.433 | -0.073 | 0.112 | 0.183 | 1.000 | 0.140 |
| Survived | 0.124 | 0.154 | 0.256 | 0.159 | 0.112 | 0.391 | 0.587 | 0.140 | 1.000 |
Missing values
Dataset A
A simple visualization of nullity by column.
Dataset B
A simple visualization of nullity by column.
Dataset A
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
Dataset B
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
Dataset A
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
Dataset B
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
Sample
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 623 | 624 | 0 | 3 | Hansen, Mr. Henry Damsgaard | male | 21.0 | 0 | 0 | 350029 | 7.8542 | NaN | S |
| 885 | 886 | 0 | 3 | Rice, Mrs. William (Margaret Norton) | female | 39.0 | 0 | 5 | 382652 | 29.1250 | NaN | Q |
| 378 | 379 | 0 | 3 | Betros, Mr. Tannous | male | 20.0 | 0 | 0 | 2648 | 4.0125 | NaN | C |
| 852 | 853 | 0 | 3 | Boulos, Miss. Nourelain | female | 9.0 | 1 | 1 | 2678 | 15.2458 | NaN | C |
| 126 | 127 | 0 | 3 | McMahon, Mr. Martin | male | NaN | 0 | 0 | 370372 | 7.7500 | NaN | Q |
| 318 | 319 | 1 | 1 | Wick, Miss. Mary Natalie | female | 31.0 | 0 | 2 | 36928 | 164.8667 | C7 | S |
| 732 | 733 | 0 | 2 | Knight, Mr. Robert J | male | NaN | 0 | 0 | 239855 | 0.0000 | NaN | S |
| 516 | 517 | 1 | 2 | Lemore, Mrs. (Amelia Milley) | female | 34.0 | 0 | 0 | C.A. 34260 | 10.5000 | F33 | S |
| 234 | 235 | 0 | 2 | Leyson, Mr. Robert William Norman | male | 24.0 | 0 | 0 | C.A. 29566 | 10.5000 | NaN | S |
| 85 | 86 | 1 | 3 | Backstrom, Mrs. Karl Alfred (Maria Mathilda Gustafsson) | female | 33.0 | 3 | 0 | 3101278 | 15.8500 | NaN | S |
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 172 | 173 | 1 | 3 | Johnson, Miss. Eleanor Ileen | female | 1.00 | 1 | 1 | 347742 | 11.1333 | NaN | S |
| 26 | 27 | 0 | 3 | Emir, Mr. Farred Chehab | male | NaN | 0 | 0 | 2631 | 7.2250 | NaN | C |
| 755 | 756 | 1 | 2 | Hamalainen, Master. Viljo | male | 0.67 | 1 | 1 | 250649 | 14.5000 | NaN | S |
| 706 | 707 | 1 | 2 | Kelly, Mrs. Florence "Fannie" | female | 45.00 | 0 | 0 | 223596 | 13.5000 | NaN | S |
| 591 | 592 | 1 | 1 | Stephenson, Mrs. Walter Bertram (Martha Eustis) | female | 52.00 | 1 | 0 | 36947 | 78.2667 | D20 | C |
| 797 | 798 | 1 | 3 | Osman, Mrs. Mara | female | 31.00 | 0 | 0 | 349244 | 8.6833 | NaN | S |
| 147 | 148 | 0 | 3 | Ford, Miss. Robina Maggie "Ruby" | female | 9.00 | 2 | 2 | W./C. 6608 | 34.3750 | NaN | S |
| 423 | 424 | 0 | 3 | Danbom, Mrs. Ernst Gilbert (Anna Sigrid Maria Brogren) | female | 28.00 | 1 | 1 | 347080 | 14.4000 | NaN | S |
| 517 | 518 | 0 | 3 | Ryan, Mr. Patrick | male | NaN | 0 | 0 | 371110 | 24.1500 | NaN | Q |
| 791 | 792 | 0 | 2 | Gaskell, Mr. Alfred | male | 16.00 | 0 | 0 | 239865 | 26.0000 | NaN | S |
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 564 | 565 | 0 | 3 | Meanwell, Miss. (Marion Ogden) | female | NaN | 0 | 0 | SOTON/O.Q. 392087 | 8.0500 | NaN | S |
| 360 | 361 | 0 | 3 | Skoog, Mr. Wilhelm | male | 40.0 | 1 | 4 | 347088 | 27.9000 | NaN | S |
| 620 | 621 | 0 | 3 | Yasbeck, Mr. Antoni | male | 27.0 | 1 | 0 | 2659 | 14.4542 | NaN | C |
| 113 | 114 | 0 | 3 | Jussila, Miss. Katriina | female | 20.0 | 1 | 0 | 4136 | 9.8250 | NaN | S |
| 354 | 355 | 0 | 3 | Yousif, Mr. Wazli | male | NaN | 0 | 0 | 2647 | 7.2250 | NaN | C |
| 269 | 270 | 1 | 1 | Bissette, Miss. Amelia | female | 35.0 | 0 | 0 | PC 17760 | 135.6333 | C99 | S |
| 347 | 348 | 1 | 3 | Davison, Mrs. Thomas Henry (Mary E Finck) | female | NaN | 1 | 0 | 386525 | 16.1000 | NaN | S |
| 565 | 566 | 0 | 3 | Davies, Mr. Alfred J | male | 24.0 | 2 | 0 | A/4 48871 | 24.1500 | NaN | S |
| 349 | 350 | 0 | 3 | Dimic, Mr. Jovan | male | 42.0 | 0 | 0 | 315088 | 8.6625 | NaN | S |
| 571 | 572 | 1 | 1 | Appleton, Mrs. Edward Dale (Charlotte Lamson) | female | 53.0 | 2 | 0 | 11769 | 51.4792 | C101 | S |
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 676 | 677 | 0 | 3 | Sawyer, Mr. Frederick Charles | male | 24.5 | 0 | 0 | 342826 | 8.0500 | NaN | S |
| 700 | 701 | 1 | 1 | Astor, Mrs. John Jacob (Madeleine Talmadge Force) | female | 18.0 | 1 | 0 | PC 17757 | 227.5250 | C62 C64 | C |
| 237 | 238 | 1 | 2 | Collyer, Miss. Marjorie "Lottie" | female | 8.0 | 0 | 2 | C.A. 31921 | 26.2500 | NaN | S |
| 856 | 857 | 1 | 1 | Wick, Mrs. George Dennick (Mary Hitchcock) | female | 45.0 | 1 | 1 | 36928 | 164.8667 | NaN | S |
| 39 | 40 | 1 | 3 | Nicola-Yarred, Miss. Jamila | female | 14.0 | 1 | 0 | 2651 | 11.2417 | NaN | C |
| 546 | 547 | 1 | 2 | Beane, Mrs. Edward (Ethel Clarke) | female | 19.0 | 1 | 0 | 2908 | 26.0000 | NaN | S |
| 632 | 633 | 1 | 1 | Stahelin-Maeglin, Dr. Max | male | 32.0 | 0 | 0 | 13214 | 30.5000 | B50 | C |
| 472 | 473 | 1 | 2 | West, Mrs. Edwy Arthur (Ada Mary Worth) | female | 33.0 | 1 | 2 | C.A. 34651 | 27.7500 | NaN | S |
| 624 | 625 | 0 | 3 | Bowen, Mr. David John "Dai" | male | 21.0 | 0 | 0 | 54636 | 16.1000 | NaN | S |
| 219 | 220 | 0 | 2 | Harris, Mr. Walter | male | 30.0 | 0 | 0 | W/C 14208 | 10.5000 | NaN | S |
Duplicate rows
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset does not contain duplicate rows. | |||||||||||||
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset does not contain duplicate rows. | |||||||||||||